基于机器学习方法研究HIV-1整合酶LEDGF/p75相互作用抑制剂的构效关系

Study of Structure-Active Relationship for Inhibitors of HIV-1 Integrase LEDGF/p75 Interaction by Machine Learning Methods

Li, Y.; Wu, Y.B.; Yan, A.X.*
Molecular Informatics, 2017, 36(7), 1600127.

    HIV-1整合酶(IN)是抗艾滋病治疗的一个有效靶点, LEDGF/p75被证明可以增强HIV-1整合酶的体外链转移活性, 阻断IN和LEDGF/p75之间的相互作用是抑制HIV复制感染的有效方法。在本工作中,收集了274个LEDGF/p75-IN抑制剂作为数据集。应用支持向量机(SVM), 决策树(DT),功能树(FT)和随机森林(RF)算法来构建多个计算模型,以预测化合物是高活性还是弱活性LEDGF/p75-IN抑制剂。 每个化合物均由MACCS指纹和CORINA Symphony描述符表征。所有模型对测试集的预测正确率均高于70%。最佳模型Model 3B由FT建立, 其对测试集的预测正确率为81.08%,Matthews相关系数(MCC)为0.62。此外,我们发现氢键和疏水相互作用对于抑制剂的生物活性起重要作用。

阅读文章原文

下载原始数据

Download Supporting Information

    HIV-1 integrase (IN) is a promising target for anti-AIDS therapy, and LEDGF/p75 is proved to enhance the HIV-1 integrase strand transfer activity in vitro. Blocking the interaction between IN and LEDGF/p75 is an effective way to inhibit HIV replication infection. In this work, 274 LEDGF/p75-IN inhibitors were collected as the dataset. Support Vector Machine (SVM), Decision Tree (DT), Function Tree (FT) and Random Forest (RF) were applied to build several computational models for predicting whether a compound is an active or weakly active LEDGF/p75-IN inhibitor. Each compound is represented by MACCS fingerprints and CORINA Symphony descriptors. The prediction accuracies for the test sets of all the models are over 70 %. The best model Model 3B built by FT obtained a prediction accuracy and a Matthews Correlation Coefficient (MCC) of 81.08 % and 0.62 on test set, respectively. We found that the hydrogen bond and hydrophobic interactions are important for the bioactivity of an inhibitor.

Read More

Classification Models performance:   Dataset (274 LEDGF/p75-IN inhibitors)

Model Name Algorithm Descriptors Training set accuracy (%) Training set 5-fold cross-validation accuracy (%) Training set 10-fold cross-validation accuracy (%) Training set LOO-fold cross-validation accuracy (%) Test set SE Test set SP Test set accuracy (%) Test set MCC Test set AUC
Model 1A SVM MACCS 74.00 64.00 65.00 66.50 0.67 0.82 71.62 0.45 0.706
Model 2A DT MACCS 76.00 61.50 58.00 59.50 0.70 0.81 74.32 0.49 0.835
Model 3A FT MACCS 74.50 61.50 60.00 60.00 0.77 0.83 79.73 0.60 0.775
Model 4A RF MACCS 76.00 62.50 61.00 63.50 0.71 0.84 75.68 0.53 0.841
Model 1B SVM CORINA 81.00 75.50 75.50 75.00 0.79 0.77 78.38 0.57 0.783
Model 2B DT CORINA 78.00 63.00 65.00 69.00 0.84 0.70 75.68 0.53 0.784
Model 3B FT CORINA 83.50 68.50 70.50 70.00 0.82 0.80 81.08 0.62 0.831
Model 4B RF CORINA 80.50 65.00 63.50 65.50 0.79 0.77 78.38 0.57 0.837

主要项目成员

李杨

博士研究生